4 reasons the Transformer model is best for NLP | Electronic Weekly

2021-12-13 14:29:52 By : Mr. Mike zhang

By pre-training a large amount of text, the AI ​​architecture based on Transformer becomes a powerful language model that can accurately understand and make predictions based on text analysis.

Since their initial development in "Attention Is All You Need" (a groundbreaking AI research paper), the converter-based architecture has completely redefined the field of natural language processing (NLP) and has been benchmarked for many AI And missions set the latest technological level. 

What is the transformer model? They are an advanced artificial intelligence model that benefits from the "education" that dozens of people may get in their lifetime.

The Transformer architecture is usually trained on large amounts of text in a semi-supervised manner-think about English Wikipedia, thousands of books, and even the entire Internet. By digesting these massive text corpora, the converter-based architecture becomes a powerful language model (LM) that can accurately understand and perform predictive analysis based on text analysis. 

In essence, this level of detailed training enables the Transformer model to approximate human text cognition—reading at a significant level. In other words, it is not only simple understanding, but also (at best) establishing upper-level connections about the text.

Recently, it has been shown that these impressive learning models can also be quickly fine-tuned for higher-level tasks, such as sentiment analysis, repetitive question detection, and other text-based cognitive tasks. Relative to the initial training content of the model, some individual data sets/tasks are fine-tuned or additional model training, allowing slight modification of network parameters for new tasks. 

Normally, this leads to better performance and faster training than training the same model from scratch on the same data set and task. 

See also: Top 10 text analysis solutions 

The Transformer model is very good at dealing with the challenges involved in sequence data. Therefore, they act as an encoder-decoder framework in which data is mapped by the encoder to the presentation space. Then, they are mapped to the output through the decoder. This allows them to scale well to parallel processing hardware such as GPUs-a processor that can push AI software forward.  

A pre-trained Transformer can be developed to quickly perform related tasks. This is because Transformer already has a deep understanding of language, which allows training to focus on learning any goal in your mind. For example, named entity recognition, language generation, conceptual focus. Their pre-training makes them particularly versatile and capable. 

By fine-tuning your pre-training transformer, you can get high performance right out of the box without a lot of investment. In contrast, training from scratch takes longer and uses more calculations and energy to achieve the same performance indicators. 

The Transformer model allows you to adopt a large-scale LM (language model) trained on a large amount of text (the complete works of Shakespeare), and then update the model for specific conceptual tasks, far beyond pure "reading", such as sentiment analysis or even predictive analysis. 

This often leads to significantly better performance, because the pre-trained model already knows the language very well, so it only needs to learn a specific task instead of trying to learn both the language and the task at the same time.

Since its early appearance, transformers have become the de facto standard for tasks such as question answering, language generation, and named entity generation. Although it is difficult to predict the future of artificial intelligence, it is reasonable to assume that the transformer model is closely watched as the next-generation emerging technology. 

It can be said that the most important thing is that they can make machine learning models not only close to the nuances of human reading and comprehension, but also far exceed it on many levels-far beyond mere quantity and speed.

Dylan Fox is the CEO of AssemblyAI

eWeek provides IT professionals and technology buyers with the latest technology news and analysis, buying guides, and product reviews. The focus of this website is on innovative solutions and covers in-depth technical content. eWeek is always at the forefront of technology news and IT trends through interviews and expert analysis. Gain insights from top innovators and thought leaders in IT, business, enterprise software, startups, and more.

Use TechnologyAdvice to advertise on eWeek and our other IT-focused platforms.

Property of TechnologyAdvice. © 2021 Technical consultation. Copyright Advertiser Disclosure: Some of the products appearing on this website come from companies from which TechnologyAdvice receives compensation. This compensation may affect how and where the products are displayed on this website, including the order in which they appear. TechnologyAdvice does not include all companies or all types of products available on the market.